Bayesian Subtree Alignment Model based on Dependency Trees

نویسندگان

  • Toshiaki Nakazawa
  • Sadao Kurohashi
چکیده

Word sequential alignment models work well for similar language pairs, but they are quite inadequate for distant language pairs. It is difficult to align words or phrases of distant languages with high accuracy without structural information of the sentences. In this paper, we propose a Bayesian subtree alignment model that incorporates dependency relations between subtrees in dependency tree structures on both sides. The dependency relation model is a kind of tree-based reordering model, and can handle non-local reorderings, which sequential word-based models often cannot handle properly. The model is also capable of handling multilevel structures, making it possible to find many-to-many correspondences automatically without any heuristic rules. The size of the structures is controlled by nonparametric Bayesian priors. Experimental alignment results show that our model achieves 3.5 points better alignment error rate for English-Japanese than the word sequential alignment model, thereby verifying that the use of dependency information is effective for structurally different language pairs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bitext Dependency Parsing with Bilingual Subtree Constraints

This paper proposes a dependency parsing method that uses bilingual constraints to improve the accuracy of parsing bilingual texts (bitexts). In our method, a targetside tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned. Then it is verified by checking the subtree list that is collected from large scal...

متن کامل

An Improved Extraction Pattern Representation Model for Automatic IE Pattern Acquisition

Several approaches have been described for the automatic unsupervised acquisition of patterns for information extraction. Each approach is based on a particular model for the patterns to be acquired, such as a predicate-argument structure or a dependency chain. The effect of these alternative models has not been previously studied. In this paper, we compare the prior models and introduce a new ...

متن کامل

A generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences

The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...

متن کامل

Discovering Constrained Substructures in Bayesian Trees Using the E.M. Algorithm

In this paper, we present an Expectation-Maximization learning algorithm (E.M.) for estimating parameters of partially-constrained Bayesian trees. The Bayesian trees considered here consist of an unconstrained subtree and a set of constrained subtrees. In this tree structure, constraints are imposed on some of the parameters of the parametrized conditional distributions, such that all condition...

متن کامل

EBMT system of kyoto university in OLYMPICS task at IWSLT 2012

This paper describes the EBMT system of Kyoto University that participated in the OLYMPICS task at IWSLT 2012. When translating very different language pairs such as Chinese-English, it is very important to handle sentences in tree structures to overcome the difference. Many recent studies incorporate tree structures in some parts of translation process, but not all the way from model training ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011